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some discussion of how the method can be extended to the multiple dimensional case. 
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I. INTRODUCTION 

One may compute the, say, 90% confidence level "Poisson statistics" upper limit of a signal by finding the value 
of the expected number of events which would result in 90% of random experiments having more events in the 
entire experimental range than the number actually observed. This method ignores the distribution of the events. A 
difference between the observed and expected event distribution is a sign of background contaminating the signal, in 
which case one should be able to get a stronger upper limit by taking into account that difference. The "optimum 
interval" method [l[ is a way of estimating an upper limit in the presence of an unknown background. The limit 
from this method is based on selecting an interval within the experimental range which contains especially few 
events compared with what would have been expected from a true signal. The method therefore avoids parts of the 
experimental range with especially high background. This interval is then used to find an especially strong (low) 
upper limit, with the calculation taking into account the method by which the interval was selected. The result is 
a true, though possibly conservative, frequentist confidence level. But the method as published makes it practical 
only for the case of low statistics. With hundreds of events, it would take too much computing time to generate 
the necessary tables. For such a case, this note proposes an alternative procedure for producing an upper limit in 
the presence of unknown background. Call this procedure the "high statistics optimum interval method" , while the 
originally published procedure will be called the "low statistics optimum interval method" . 

Let's assume at first that events are characterized by some one-dimensional value, s. For example, s might be the 
"energy" of the event. Later the multidimensional case will be considered. The set of events consists of a "signal" as 
a function of s plus an unknown background. There may also be a known background, but that can be considered 
to be a part of the signal. The "size" of a proposed signal is characterized by the total number of events expected 
from it in the experimental range of s. If there is a known background, the total number of events expected from 
it can then be subtracted from the optimum interval upper limit to give the size of the part of the signal which the 
experiment is designed to measure. 

More formally, the total density distribution of the data within a one-dimensional experimental range character- 
ized by variable "s" is the sum of the distribution from a non- negative signal, p(s), plus an unknown non- negative 
background, pu{s): pt{s) = p{s) + pu( s )- Take the experimental range to run from s a to si,. Instead of using s 
to characterize events, make a change of variables to X(s) = J p(s). This new variable runs from X(s a ) = to 
X(sb) = P, the total number of signal events expected within the experimental range. Finding an upper limit on 
the signal means finding an upper limit on p. Call u x" the length of an interval in X, so that x = X(s') — X(s) is 
the expected number of events from the signal between s and s'. With the published low statistics optimum interval 
method, for each fixed number, n, one seeks the interval with n events which has the greatest x. Of all intervals with 
n events, the one with largest x is the optimum one for getting an upper limit in the presence of background because 
it tends to be the one with the lowest background relative to the signal. The larger the observed value of maximum 
x is for a given n, the stronger is the rejection of the assumed signal. For each n a Monte Carlo generated table is 
used to quantify how strongly the given signal is excluded. The probability that the largest length with < n events 
is less than x is called u C n (x; p)" . It is the tabulated measure of how strongly the assumed signal is excluded by the 
data. If C n (x; p), for x taken as the largest length in X with < n events, is 90%, then the assumed signal is excluded 
to the 90% confidence level. The "optimum" n is the one that gives the strongest exclusion. For this optimum n, the 
interval which most strongly rejects the signal is the "optimum interval" . Monte Carlo is again used to generate a 
table used to calculate the probability of the signal being excluded as strongly as it was by the optimum interval. 

Since x, n, and p are all defined in a way that is invariant under change of variables from the original s, this method 
is invariant under a change of variables. The method cannot be biased by how the experimenter chooses to bin data 
because the data are not binned. The method is not very sensitive to cuts in the range of s used to exclude high 
backgrounds because it automatically avoids regions of high background even if they are included in the range. But 
the Monte Carlo calculation of C n (x; p) can be time consuming for high n and p. 
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FIG. 1: Plots of C n (x; fi) (solid) and Cao{y; f) (dashed) for n — 20, fi = 50, y = (n— x)/y/x, and f = x/fi. 



II. EXTENSION TO HIGH STATISTICS 

The following statements are equivalent: "Largest length in X with < n events is < x." = "No interval of length 
x has < n events." = "The smallest number of events in intervals of length x is > n." Thus C n (x; fi) is also the 
probability that the smallest number of events in intervals of length x is > n. One way of formulating a Monte Carlo 
method of computing this interpretation of C n (x\ /i) is to first define F(s) = X(s)/fi, the cumulative probability 
distribution of s (the probability that a random signal event will have a smaller measured value than s). Then 
/ = xj n = (X(s') — X(s))/n = F(s') — F(s) is the difference in F between two points. Define y = (n — x)/^/x. This 
definition was chosen so that for sufficiently large x, y from any particular fixed interval of length x is approximately 
distributed according to a standard normal frequency distribution - a Gaussian with mean zero and unit standard 
deviation. Each Monte Carlo experiment takes the interval in F from to 1, generates events uniformly in it with 
probability density /i, and finds the interval whose / = xj \x has the smallest y. Call this smallest y u y m in" ■ With a 
large number of such Monte Carlo experiments one obtains the probability distribution of y m in- C n (x; /i) is then the 
probability that y m in > {n — x)/y/x. 

In the limit of very large /x and x, the result becomes what we'll call ll C oa (y m i n ; /)", and it's independent of fi. 
Gaussian Brownian motion can be thought of as the result of breaking a time interval into a huge number of equal 
small steps, each of which adds an independent Gaussian random contribution with zero mean and equal tiny variance. 
In the limit of an infinite number of infinitesimal steps, the resulting random path is denoted by "«;(<)". With w(0) 
initialized at zero, and the size of the variances chosen to give w(t + l) — w(t) a standard normal frequency distribution, 
w(t) is called a "standard Brownian process" or a "standard Wiener process". Coo(y; f) is then the probability that 

w(t + f)-w(ty 

is greater than y. It has been computed with a Monte Carlo program whose technical details arc described in Appendix 
A. The function is evaluated from a table whose interpolation in / is simplified by the empirical fact that if is 
tabulated in y' = y(l — 0.3 log(f)) — 1.7 log(f) then the resulting function of y' varies slowly with /. C OD (y; f) decreases 
as y increases with constant /, and it also decreases as / decreases with constant y. 

The definition of leads one to expect C n (x;fx) w C 00 (y- 1 f), where y = (n — x)/^fx and / = xj\i. This 
approximation is only valid for large n, but for large x, the probability of small n is negligeable. Figure [1] compares 
C 20 (a;;50) with C^y; f) for y = (20-x)/y/x and / = x/50. 

If one finds y m in for the data, Coo(y m i n ] f) says how strongly the data reject the assumed signal as being too high. 
One may then vary / in order to find the "optimum" one that gives the strongest upper limit. For a finite number 
of events in the data, call x m the largest gap between adjacent events. For this interval, y = —yfx m . Decreasing 
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FIG. 2: 90% confidence level value, Cmo,x(0.9, fmin, fi) as a function of fi for f m % n = 0.00, 0.02, 0.1, and 0.5. Points are 
generated by Monte Carlo. The smooth curves at high fi are fits of the form A + B/^/Jl to the points with fi > 3500. For 
fi < 54.5, CMax(0-9, 0, (j,) is shown as calculated for the low statistics optimum interval method. While some of the lack of 
smoothness of the low fi curve comes from statistical fluctuations of the Monte Carlo program output, most of it represents 
the true behavior of Cmcux, as is explained in Ref. pj|. 

/ below f m = x m / fj, will give a larger value of y with the smaller value of /, hence a smaller value of C x (y; /); so 
/ < fm cannot correspond to the "optimum" interval. The search for the optimum interval over all / is therefore 
equivalent to a search restricted to / > f m , the / corresponding to the largest gap between adjacent events. One 
might prefer restricting / to be greater than some even larger value, /mm, if on very small scales the event distribution 
is expected to be unreliable for experimental reasons. Or it may be that with a sufficiently large number of events, 
excessive computation time is needed to find a good approximation to the minimum y for all / > f m . So the following 
discussion will assume / is restricted to be greater than some f m in, which includes the case fmin = 0. 

In principle, the optimum interval is optimized over an infinite number of possible intervals. But for any finite 
number of events, only a finite number of intervals needs to be examined to find the optimum one. For a given 
number of events in the interval, it is the one with the largest /, because x = [if will then need the smallest fi to 
make the expected number of events, x, be too large for the observed number of events. So the only intervals which 
need to be considered are those which begin just after one event and end just before some other event. First find for 
each n the interval with n events and with the largest /, then compare the computed upper limit fi for each n and 
choose the smallest \i. 

Call "Cmoi" the maximum of C oa (y; f) over all intervals with / > f m %n- A Monte Carlo program can be used 
to compute the probability distribution of Cmo.x, and thus can give the true confidence level by which the signal 
is excluded. Call Cmo,x{C, f m in, fi) the value of Cmox for which the C confidence level is reached for the assumed 
values of / mm and \i. Figure [5] shows CW ax (0.9, / mm , [ij for various values of f m in- The lowest value of [i for which 
Cmux is defined when the confidence level is 90% is [i = 2.3026. The highest value of \i for which the C n of the low 
statistics method have been tabulated is [i = 54.5. Figure [2] shows results for the low statistics optimum interval 
when 2.3026 < fi < 54.5, and for the high statistics method when /j, > 54.5. Cmux has been tabulated only up to 
fi = 15310, but can be extrapolated to /i > 15310 using the form Cmux (C, fmin > A 4 ) = A + B/y/JI with A(C, f m in) and 
B(C, f m in) fit to the calculated results with ji > 3500. The extrapolation has been verified to within the accuracy 
available from the number of Monte Carlo generated model experiments at various values of /i up to 100000. 

Since the computation of y depends on the assumed value of fi, Cuax is sensitive to /i. The optimum interval upper 
limit is the signal size for which /i, the total number of expected events in the signal plus known background, satisfies 
Cmux — CuaxiC, fmin, Methods for solving this equation are described in Appendix B. 

By having a choice of f m ini one is exposed to the possibility of choosing f m in so as to push the upper limit in a 
desired direction. Those using this method should choose fmin once at an early stage and stick with it. In the absence 
of a good reason to choose a non-zero value, fmin = is preferable, because the smallest fmin makes use of the most 
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FIG. 3: Ratio between the median 90% confidence level upper limit and the true signal as a function of /j,, the expected number 
of events in the true signal, when there is no background. The solid jagged line shows the Poisson upper limit. It is jagged 
because of the discrete nature of the method - for most /j, it gives a stronger than 90% upper limit. The dash-dotted line almost 
covered by the Poisson line is the 90% confidence level upper limit for fmin = 0. It is almost the same as the corresponding 
limit for fmin = 0.2, which is not shown. 

information from the data. 

The low and high statistics optimum interval methods can be merged to form what now will be called the "optimum 
interval method" . In both the computation of Cuax and the computation of the table for Cmux, use C n for \i < 54.5, 
and use for // > 54.5. When the optimum interval method gives a result with /i < 54.5, that result is the 
same as for the low statistics optimum interval method; otherwise the result is the same as for the high statistics 
optimum interval method. In either case, the method for computing Cuax means, for example, that in the absence of 
background, and independent of the true value of the signal, a 90% confidence level upper limit has a 90% probability 
of being above the true value. This property of the method has been verified even near the discontinuity in Cuax at 
fi = 54.5. And of course the upper limit has a > 90% probability of being above the true value if there exists unknown 
background. 

III. COMPARISONS OF METHODS FOR COMPUTING UPPER LIMITS 

The optimum interval method can be compared with the Poisson statistics upper limit. For the case of no back- 
ground Fig. [3] shows the ratio between the upper limit and the true signal for these two ways of computing upper 
limits. For this figure (JMed denotes the median value of the computed upper limit signal size and OTrue is the true 
signal size. The notation "<r" is used because this method was derived to obtain an upper limit on a cross section, 
but in general a is just something proportional to the total expected number of events in the signal. 

Both methods give approximately the same result in the absence of background. As might be expected, when there 
is no background the Poisson upper limit tends to be slightly stronger. At those values of [i for which the optimum 
interval method is slightly stronger, the Poisson limit has a probability of less than 10% of finding an upper limit 
below the true signal size, while the optimum interval method has exactly 10% probability of making such a mistake. 

As an example of what the method does with a background distributed differently from the expected signal, consider 
a background that becomes negligeable at one end of the experimental range. To be more specific, suppose the part 
of the signal whose upper limit is to be found has density ps(s), and define X(s) = ps{t)dt. With the change of 
variables from s to A, this part of the signal is distributed uniformly in X. The example of background used when 
making Fig. 2] was, on the other hand, taken to have density proportional to X, rather than uniform in X. Fig. 0] 
compares the Poisson 90% confidence level upper limit, the optimum interval limit with f m i n = 0.2, and the optimum 
interval limit with f m i n = 0.0 for the case of only such a background, which in all but the bottom two dotted curves 




■5 




FIG. 4: Ratio between the median 90% confidence level upper limit and the background expected number of events as a function 
of the expected number of events in the background when there is no true signal. The jagged solid line above (JMed/crback = 1 
is the Poisson upper limit. The lower solid line is the optimum interval 90% confidence level upper limit for fmin = 0.0, and 
the dash-dotted line is for f m in = 0.2. The lower two dotted curves show the f m in = upper limit when half the background 
is known and when all of it is known. The dashed segment just above the solid line at fewer than 100 expected background 
events shows what the result would be for fmin = 0.0 if the pure high statistics optimum interval were used. 
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FIG. 5: Ratio between the median 95% confidence level upper limit and the median 90% confidence level upper limit signal 
as a function of the expected number of events. For the case of the background described in the text, the solid curve is the 
ratio for fmin = 0.0, and the dash-dotted one below it is for fmin = 0.2. Below the curves for the background case are two 
curves almost on top of each other showing the zero background case: a dotted curve for f m in = 0.2 and a dashed curve for 

fmin ~ 0.0. 
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is assumed to be completely unknown. In this case, the Poisson upper limit is much worse than the optimum interval 
limits. As might be expected, for high statistics, / m j„ = 0.0 (lower solid curve) gives a stronger upper limit than 
jrain = 0.2 (dot-dashed) because it's able to use a smaller optimum interval where background is especially small near 
one end of the experimental range. The bottom two dotted curves show what happens when half the background 
is considered known, so can be subtracted, and, for the lowest curve, when all the background is considered known. 
Appendix B gives more details on how background is subtracted. Using the high statistics optimum interval method 
with its Gaussian approximation, even when p, < 54.5, gives a result (dashed) slightly weaker than is given by the 
optimum interval method with the low statistics method for p < 54.5. 

Because the computation of Cuax for p > 15310 was done with a fit to the form A + B/yfp, one might worry that 
inaccuracies in the extrapolation will make results inaccurate. The effect of an inaccurate value for Cmux is to make 
the true confidence level somewhat different from the intended one. The probability of mistakenly getting a "90% 
confidence level" upper limit below the true signal should be exactly 10% when there's no unknown background. The 
Monte Carlo calculation used to make Fig. [3] used 50000 model experiments for p < 2500, 5000 model experiments 
for 2500 < p < 20000, and 500 for p = 100000. As would be expected for a correct computation of Cuax, mistakes 
occurred in 10% of the model experiments to within the statistical errors of the limited number of trials. Similar tests 
have been done for 95% and 99.5% confidence levels up to p sa 20000. Furthermore, upper limits for high p are not 
very sensitive to the choice of value for the confidence level. Figure [5] shows examples of the ratio between the 95% 
confidence level upper limit and the 90% confidence level upper limit. Curves are irregular largely because of the 
limited number of Monte Carlo experiments used in their calculation. 

IV. EXTENSION TO EVEN HIGHER STATISTICS 

The method described here may be extended to extremely high numbers of events. Today's desktop computers can 
compute the optimum interval upper limit for 10000 events in seconds. But since the computing time grows like the 
square of the number of events, with enough events the method becomes impractical. There's a way to approximate 
the optimum interval method in a way that instead grows linearly with the number of events. Recall that in Sections 
I and II evnts were characterized by some one-dimensional parameter, s, such as energy, with F(s) the fraction of 
events expected to be below s for the assumed event distribution without background. Bin the data in 1000 or more 
equal bins in F. The computer time needed to do the binning grows linearly with the number of events. Then consider 
only those intervals which consist of one or more consecutive bins. The computer time for finding the optimum such 
interval is independent of the number of events. The larger the number of bins, the closer is this method to the 
optimum interval method without binning. At this time, tables and software for this modification of the optimum 
interval method have not been produced. If they are produced, they will be for a particular choice of the number 
of bins of F. That choice can easily be independent of what the experimenters want their result to be; so choice of 
binning would not be a way experimenters could inadvertently bias results. 

V. EXTENSION TO MULTIPLE DIMENSIONS 

Some experimentalists need a generalization of the optimum interval method to more than one dimension. The 
impetus for producing this section and the corresponding Appendix C was a discussion^ with physicists who were 
already planning to write software extending the low statistics optimum interval method to two dimensions. Some 
ideas are discussed here concerning the extension to an arbitrary number of dimensions and to the case of high 
statistics, but these ideas have not yet been implemented in software. 

Suppose instead of data described in terms of a distribution in one-parameter, s, there are D dimensions, 
si, S2, Sfl, with the signal distributed according to some density function Pd(si, S£>). Appendix C shows 
how to transform coordinates from (si, ...,sn) — ► (f*i,---, r r>) m a w &y which maps po into a uniform distribution 
within the unit "generalized cube" with < r p < 1 for all 1 < p < D. As in the case of one dimension, call "p" 
the expected number of events in the entire range of the experiment, which in terms of coordinates r is the unit 
generalized cube. Any region within the unit generalized cube occupies fraction / of the generalized cube, and / is 
also the generalized volume of the region. The expected number of events within a region inside the unit generalized 
cube is x = pf. 

As in the case of one dimension, selection of the optimum region for an upper limit on po begins by choosing some 
set of allowable regions within the total experimental range. In the case of one dimension, the allowable regions are 
intervals. For each n find the allowable region with < n events having the largest expected number of events, x. If x 
is too large for the observed n, that means the assumed signal, po, is excluded by observation as being too large. For 
D dimensions take the allowable regions to be "GRPs" , generalized rectangular parallelepipeds with all edges parallel 
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to coordinate axes in r space. Explicitly, a GRP is a region of r space defined with some fixed set of a p < b p as the 
set of r for which a p < r p < b p for all 1 < p < D. It's a line segment in one dimension, a rectangle in two dimensions, 
and a rectangular parallelepiped in three dimensions. The generalized volume of a GRP is 

D 

f = Y[( b p- a p)- 

P =i 

"C n D(x; p)" is defined as the probability for D dimensions that the largest GRP with < n events would have gener- 
alized volume less than / = x/p, if the assumed po were correct. For D = 1 this definition is equivalent to the one for 
C n {x; p). As in the case of D = 1, C n D(x; p) is a measure of how strongly the assumed po is excluded by the data. 
Appendix C discusses how to evaluate C n D- 

The optimum GRP, the generalization of the optimum interval of one dimension, is the one for which x and n give 
the largest value of C n o{x 1 p), which one may call 11 Cmo,x" ■ Define u Cmo.x,d{C, p)" as the value such that a fraction 
C of random experiments with a model having that p (and no unknown background) will give Cuax < Cmo,x,d{C, p). 
C is the confidence level by which the model is excluded because the data show too small a signal. 

Cqo of one dimension can be generalized to CxjB. First notice that C n u{x;p) is also the probability that the 
smallest number of events in regions with expected number of events, x, is > n. As for the case of D = 1, define 
y = (n — x)/ \fx. If the smallest y in GRPs with generalized volume / is y m in, then C n o{x; p) is the probability that 
V-min > (n — x)/y/x. In the limit of large p and x, the probability distribution of y in any single GRP is Gaussian 
with zero mean and unit standard deviation. Now generalize standard Brownian motion to D "time" dimensions: 
Break the entire unit generalized cube into infinitesimal pieces, each of which contributes to any region containing 
the piece an independent signal distributed according to a Gaussian with zero mean and variance such that the entire 
unit generalized cube has a signal of unit variance. The standard deviation of the signal in a region with generalized 
volume / is y/~f. Thus in any region with generalized volume /, Signal/ y/~f has zero mean and unit standard deviation, 
exactly as does y = (n — x)j^px in the limit of high statistics. Define "Coolly;/)" to be the probability that for 
all GRPs of generalized volume / within the unit generalized cube, the Signal/y^/ is greater than y. Appendix C 
discusses its computation. For D = 1 its definition is the same as that of C^ of section II. 

Finally, C M ax{C, f min , p) can be generalized to Cmo.x,d(C, f mm , p)- If only f min = is considered, the same 
function might as well be written as 11 Cm<ix,d{C, p)" . It is defined in the same way as the function written the same 
way for the low statistics case. The difference is only that the low statistics function applies only for low p, while the 
high statistics function applies for high p. Cmo,x(C, p) can be computed with a Monte Carlo program for low p once 
the C n D functions are available for low p, and for high p once the CooD function is available. 

VI. CONCLUSIONS 

The optimum interval method has been extended to the case of high statistics by making a Gaussian approximation 
to the probability distribution of events in each subinterval of the experimental range. Even with this approximation, 
the method is a true, though possibly conservative, frequentist upper limit, with the probability of mistakenly getting 
too small a result being at most one minus the confidence level. 

Software and tables are available[|| for applying this method to actual data. Once 1000 events have been manip- 
ulated into a form appropriate for the software, it takes a 730 MHz Pentium III computer about 0.022 seconds to 
compute the upper limit. Computation time is approximately proportional to the square of the number of events. 

Although the extension of the optimum interval method to multiple dimensions can be computationally very 
intensive, and has not yet been implemented in software, ideas for how it should be done have been presented. 

APPENDIX A: TECHNICAL DETAILS OF THE COMPUTATION OF THE RELEVANT FUNCTIONS 

In the computation of Coo, each Monte Carlo trial begins by breaking the interval (0,1) into a very large number, N, 
of subintervals. For each subinterval generate an independent random value according to a standard normal frequency 
distribution. Restrict oneself to lengths / within (0,1) which are integer multiples of \/N . For each such / for which 
C 'oo (y rain'-, /) is to be computed, allow it at first to only be at positions within (0,1) such that its endpoints are on 
endpoints of subintervals. Such a length / interval contains N f subintervals. For any such length / interval, scale 
the contribution from all subintervals it contains by a factor of 1/y/Nf to produce a total signal y with zero mean 
and unit standard deviation. Move the interval of length / through (0,1) in steps of size 1/N, searching for y m i ni the 
smallest y. That's the result of this one Monte Carlo trial. Do a huge number of them, and tabulate the distribution 
of ymin for the various values of /. 
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With finite N, the computed estimate of is systematically shifted above its true value because the minimum 
signal occurs somewhere inside one of the tiny subintervals, not at an end of one as was assumed in the proposed 
Monte Carlo method. The size of this systematic shift is of order the standard deviation of the signal contribution 
to subintervals. I.e., it's of order 1/^/N. Thus the resulting distribution of y m in will be systematically shifted too 
high by an amount of order 1/^/N, and the result will not get arbitrarily accurate as the number of Monte Carlo 
experiments grows arbitrarily high. I expect the optimum time for a given accuracy would be achieved by choosing 
the number of Monte experiments to be proportional to N, so that both the systematic error from finite size of the 
subintervals and the statistical Monte Carlo error would decline like 1/W(N). The computing time then would grow 
like 1/accuracy 4 , instead of the less rapid growth 1/accuracy 2 usually characteristic of Monte Carlo calculations. 

To restore the Monte Carlo error to a 1/VN decline with N, correct for the systematic shift from finite N by 
including a Monte Carlo estimate of what the minimum y would be within each subinterval. I.e., pick a random value 
according to the probability distribution of the minimum within the subinterval. Call the endpoints of a subinterval 
"io" and a tx" , with t± — to — l/N. In the evaluation of Coo the Monte Carlo program finds y(t) = [w(t + /) — w(t)]/-\/f 
at t equal to to and t\. We want the distribution of what would be found if the minimization were also done over 
all points to < t < t\. To do this note that as the interval slides from t = to to t = t\ it loses infinitesimal random 
Gaussian contributions from its low end and gains them at its high end. Thus the change in y as t increases is itself 
a Gaussian Brownian process. When t increases by l/N , [w(t + /) — w(t)] loses a random contribution with variance 
l/N from its low end and gains an independent random contribution with variance l/N at its high end, for a total 
variance of 2/N. As t increases from to to t\, y = [w(t + /) — w(t)]/^/f changes by a random value with variance 
a 2 = 2/ (TV/). Form x from t by a shift and rescaling to put x = at t — to and x = 1 at t = t%. The change in y 
is proportional to that from a standard Brownian motion rescaled by a factor which makes the total variance for the 
change from x = to x — 1 be a 2 instead of unity. We may therefore write 

m = w(t + f) -w(t) = W (t + f)_ - W (t Q ) + (A1) 
V f V / 

where w(x) is a standard Brownian process with v = w(l) constrained to give the correct value of y{ti). To find 
the minimum of y(t) as t varies between to and t\, find the minimum of w(x) as x runs from to 1 subject to the 
constraint that w(l) — v. For the purpose of correcting the minimum in the Monte Carlo program with finite N, 
the minimum of w(x) over the range < x < 1 was chosen randomly from its probability distribution given fixed 
v = w(l). This probability distribution will now be derived. 

The minimum of w(x) for < x < 1 can be no larger than the minimum of and v, because w is equal to each 
of those values at the endpoints of its range. Consider only z < min(0, v). For a given such z the minimum of w(x) 
for 0<a;<lis<zif and only if somewhere between x = and x — 1 w(x) crosses z. Call ll x z " the value of x for 
which w{x) first crosses z. Define wr(x) to be the same as w{x) when x < x z , but for x > x z reverse the sign of each 
infinitesimal Gaussian contribution. The new function is reflected through z for x > x z . Since reversing the sign of 
a Gaussian contribution with zero mean leaves it a Gaussian contribution with zero mean, wr(x) is also a standard 
Brownian process. For x > x z , wr(x) — z — (w(x) — z) ~ 2z — w(x), and its first crossing of z is at the same x z as w. 
So any w(x) for which the minimum is < z and for which w(l) — v defines a wr(x) for which wr(1) — 2z — v. Since 
wr(1) — 2z — v < z < = wr(0), any wr(x) for which wr(1) = 2z — v must cross wr(x) = z. Call the first such 
value of x The reflection of wr about z for x > x z then defines a w(x) whose minimum is below z. There is, 

therefore, a one-to-one correspondence between a) standard Brownian processes whose minimum between x — and 
x = 1 is below z < min(0,v) and for which w(l) = v and b) standard Brownian processes for which wr(1) = 2z — v. 
The probability distribution of wr(1) is the normal frequency distribution. So the probability of the minimum of w(x) 
being below z in the range < x < 1 for w(l) within some tiny 5 of v is proportional to S exp(—(2z — v) 2 /2). The 
probability of w(l) being within the same 8 of v is proportional to 6 exp(— v 2 /2), with the same normalizing factor. 
Since the probability of A given B is equal to the probability of (A and B), divided by the probability of B, we have 
the probability that the minimum of w(x) is < z given w(l) = v is 

-(2 Z -vf/2 

P= e -„ va =e 2 'C-»>. (A2) 

Various mathematical references [H, Q on stochastic processes give similar, but more rigorous, derivations of equations 
related to IA2I Random z will have the desired probability distribution if one first chooses random P uniformly over 



(0,1), then solves the equation relating P and z for z: z = \ v — \J v 2 — 2 ln(P)J /2. Replace ui(x) with z in Eq. 

IA1I to get a Monte Carlo minimum y(t) over the range to < t < t\. Although this method eliminates most of the 
systematic error which would otherwise be caused by the finite value of N, large N is still desireable because assuming 
independently random z for different subintervals ignores some correlations. 



9 



APPENDIX B: SOLVING C Uax = C Max 



The data for which the equation CMax = CMax is to be solved can be expressed as a set of values of the cumulative 
probability, F — X/fi, as introduced in section II. Assume the signal can be expressed as a sum of a part for which 
one wants an upper limit plus a part consisting of a known background. The total expected number of events below s 
is X(s) = Xs{s) + Xb(s); Xs(s) is the expected number below s from the part of the signal whose upper limit is to be 
determined, and Xg(s) is the expected number below s from the known background. Xs(s ) = fig, Xg(sb) = (J<b, and 
fj, = fis + A*s- For the known background, fis is assumed to be known. The set of F for the events can be computed 
for any trial value of \i from F = (1 — Hb/^)Fs + {hb/h)Fb, where F$ = Xs/ns and Fb — Xb/hb- From the trial 
value of fj, and from the resulting set of F one may find CMax, and one may compare it with Cmux(C, / mm , /i). Once 
a /i has been found which makes the two equal, subtraction of the known background is done by converting the upper 
limit on fj, into one on /15 = fi — fis- 

The equation Cmux = Cmo,x(C, fmin> A 4 ) f° r the fj, corresponding to the upper limit can be solved using CERNLIB[6] 
routine RZERO, which finds the zero of a function of one real variable. This routine can also be used for the upper 
limit in the case of high statistics. But for high fj,, and with no background subtraction, there's another method which 
usually converges faster. 

The optimum interval is the one which requires the lowest upper limit fi to be excluded at confidence level Cooijj; /). 
For the optimum interval with / > f m i n , C 00 (y; f) = Cm<ix(C, f m in, A*) for the upper limit value of /1. The inverse 
function of CMax = Coo(y;f) is y = yoo{CM ax \f)- From y = (n - x)/y/x = (n — fJ.f)/VJTf one may then solve for 
fi. The interval with / > f m in with the lowest /i is the optimum interval, and fip = \i — fis is the "high statistics 
optimum interval method" upper limit which we seek. Since CMax does have some dependence on /i, an iterative 
procedure is needed: guess an initial fi and find its corresponding CMax- Then compute the upper limit n, and take 
this improved estimate for /1 to get an improved estimate for CMax in the next iteration. 

The function Coo(y; /) was computed with a Monte Carlo program only for 0.01 < / < 1.0. Its inverse, t/oo(C; /), 
is also only tabulated for / in that range. But for a sufficiently large number of events in the experiment, the high 
statistics optimum interval method requires evaluating yoo(C; /) for / < 0.01. 

For finding the minimum y of all intervals with expected fraction / of the events, continuously move the interval 
of size / along it, while seeing if the current y is the lowest. C 00 (y 00 ; f) is the probability that the lowest y is greater 
than j/oq. Now imagine the entire experimental range broken into p pieces. The intervals which are fractions / of 
the whole range are fraction /o = pf of each piece. Each of the p pieces has probability C 00 (y 00 ] fo) of having its 
lowest y greater than y^. The probability of all the p pieces having lowest y greater than y x is CP (y 00 ; fo)- These 
considerations lead to the approximation 

Coofa/JwC&fa/o), (Bl) 

with p — /o/ /. This approximation for Coo(y; /) is not exact because the measurement of minimum y separately in 
each piece misses those size / intervals which overlap two adjacent pieces. The approximation can be improved by 
decreasing the effective number of / length intervals in the whole region, 1//, by a little bit for each of the p — 1 
boundaries between pieces. I.e, the approximation will be better if 1/ / = p(l//o) — s(p— 1), for some s of order unity. 
Solving for p gives 

P-fe <B2) 
V/0 - s 



Solving both sides of Eq. IB 1 1 for y gives 



yoo(C;f)^y oo (C 1 ^-J ). (B3) 



Empirically, s = 0.94 works fairly well, according to tests with relatively large /q for 0.01 < / < 1, but for the 
smaller values of /o, it's hard to distinguish between various choices of order unity for s. 

Although Eqs. IBll IB21 and IB3I were motivated by choosing p to be an integer number of pieces, they can be 
generalized to non-integer p. To see this, suppose that not only does fo = pf for integer p, but f\ — qf for 
another integer, q. Then Eq. IBll implies Coo(y; /) ~ C^ {y; f\) which, along with Eq. IBll gives for integer p and q 

Coo(y,fl) ~ C%l q (y;f ), where 

p/q = (1// - *)/(l//o -s)_ 1/h - s 



(l//-s)/(l//i-s) l//o 



These equations are equivalent to Eqs. IBll and [B2l with integer p — + fraction p/q. 
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TABLE I: Comparison of the approximation of Eq. IB3l for various values of fo- 



f 


C 


fo = 0.01 


fo = 0.02 


fo = 0.04 


fo = 1.0 


0.0005 


0.93 


-4.66 


-4.65 


-4.65 


-4.63 




0.96 


-4.79 


-4.77 


-4.77 


-4.75 




0.99 


-5.09 


-5.09 


-5.12 


-5.02 


0.0020 


0.93 


-4.33 


-4.34 


-4.34 


-4.33 




0.96 


-4.47 


-4.48 


-4.47 


-4.46 




0.99 


-4.79 


-4.77 


-4.77 


-4.75 


0.0040 


0.93 


-4.16 


-4.16 


-4.16 


-4.18 




0.96 


-4.30 


-4.31 


-4.30 


-4.30 




0.99 


-4.63 


-4.63 


-4.63 


-4.61 


0.0100 


0.93 


-3.92 


-3.91 


-3.91 


-3.96 




0.96 


-4.07 


-4.07 


-4.07 


-4.10 




0.99 


-4.42 


-4.42 


-4.42 


-4.41 



The extrapolation of Eq. IB3I from fo to / may be poor if / is too small, because C 1//p may be too close to unity 
for the table used by y^ to give an adequate approximation. In such a case one may fall back to using / = 1, for 
which Coo and yoo can be calculated without resorting to Monte Carlo generated tables. The probability distribution 
of y for / = 1 is the normal frequency distribution, for which publicly available programs can compute the integral 
and the integral's inverse. Examples are DFREQ and DGAUSN of CERNLIB[6j. For fo = l the accuracy for low / 
is somewhat improved by using p = (1// — 0.946)/0.051 instead of Eq. IB2I with s — 0.94. This modified form for 
p is used for fo = 1 in tabic HI which shows that the extrapolation is relatively independent of the value of fo used. 
The Monte Carlo programs used to generate the tables used by software implementing the high statistics optimum 
interval method often had to extrapolate from fo = 0.01, but have so far never required extrapolation from fo = 1. 

APPENDIX C: TECHNICAL DETAILS FOR THE EXTENSION TO MULTIPLE DIMENSIONS 

The optimum interval method in multiple dimensions begins with a transformation of coordinates from initial 
coordinates, si,...,sd, with signal density po{s), into ones for which the entire experimental range is in a unit 
generalized cube with uniform transformed density. Define p p for < p < D — 1 inductively by 

/+oo 
dtp p (si,...,Sp^i,t), (CI) 
-oo 

and then define r p for 1 < p < D by 

1 f Sp 

r p (si,...,s p ) = / dtp p (si, ...,s p _i,t). (C2) 

Pp-l J-oo 

The expected number of events in the entire experiment is p — /Jo- 
in order for a Monte Carlo program to compute C h d{x, p), and in order to evaluate the optimum generalized 
rectangular parallelepiped (GRP) for the actual data, it is necessary to find for each number of events, n, the 
maximum generalized volume of a GRP for Monte Carlo or real data. To be a candidate for a maximal generalized 
volume GRP for a given n, each generalized face must butt up against one of the event points; otherwise n could be 
kept constant while expanding the GRP along the normal to that generalized face until an event point is encountered. 

An algorithm is needed for the computer program to find all GRPs which could have the maximal generalized 
volume for each n. For this purpose, assume there are M events ordered so that si(i) increases with i as i runs from 
1 to M. For any non-negative po, Eq |C2l implies that if the s\{i) are in increasing order, so are the ri(i). To simplify 
allowing boundaries of the unit generalized cube to also be GRP boundaries, define two additional points, i — and 
i = M + 1, with r p (0) = and f p (M + 1) = 1 for all 1 < p < D. Define each GRP by the set of a p = r p (i p i) and 
b p = r p (i P H), where the i p l are the points which the low r p generalized faces of the GRP touch and the i p h are the 
points the high r p generalized faces touch. For p — 1 in, and i\n can be any pair of points with < i\L < i\H < M+l. 
For p > 1, ipi = and/or i p H = M + 1 arc allowable for a candidate maximal GRP. Other values of i p i and i p h for 
p > 1 must for all 1 < q < p satisfy the following criteria: 



r P (i P L) < r p (i qL ), r p (i qH ) < r p (i pH ), 
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and 



Tqiiqh) < r q {ipL), r q (i pH ) < r q (i qH )- 



A simple way to restrict the values of i p l and i p u that need to be checked is to note that for q = 1, this last condition 
is equivalent to 



Once an acceptable GRP is found, One may then go through all 1 < i < M points and count the ones inside the GRP, 

1. e., the ones which for all 1 < p < D satisfy r p (i p L) < r p (i) < r p (i p H)- But for p = 1 that condition is equivalent to 
i\L < i < iiH, so only those values of i need to be considered as possibly being inside the GRP. When the number of 
points inside the GRP is n, the GRPs volume is a candidate for being the largest for that value of n. This algorithm 
can be applied to the data, and also can be used with a Monte Carlo program which repeatedly generates sets of 
events uniformly in the unit generalized cube in order to compute the functions C n o(x; p). 

A Monte Carlo program to evaluate Coqd can begin with a unit generalized cube divided into N D tiny generalized 
sub-cubes, each with side of length 1/N for some large integer N. For each Monte Carlo trial, give every generalized 
sub-cube an independent random value distributed according to a Gaussian with zero mean and unit standard devi- 
ation. Consider only GRPs defined by a p < r p < b p for all 1 < p < D with the a = (oi, a p ) and b = (pi, b p ) on 
generalized sub-cube vertices. A generalized volume / GRP contains N D f generalized sub-cubes. For any such GRP, 
scale the contribution from all the generalized sub-cubes it contains by a factor of 1/ \J N D f to produce a total signal 
y with zero mean and unit standard deviation. Consider all GRPs with / within some bin for which C OC D(y] /) is to 
be computed, with a on generalized sub-cube vertices, and with the GRP entirely inside the unit generalized cube. 
The optimum GRP for the given / bin is the one with smallest y — y m i n - Do a huge number of such Monte Carlo 
trials, and tabulate the distribution of y m i n for the various bins of /. 

As for the D = 1 case, there's a systematic upward shift in the computation of C^o caused by use of finite N. 
Correct for this systematic shift by estimating what y m in would be if the a of GRPs could be shifted anywhere within 
each generalized sub-cube, instead of being restricted to generalized sub-cube vertices. The random minimum should 
have the distribution expected for Gaussian Brownian motion in multiple "time" dimensions within the generalized 
sub-cube, given the GRP signal for a at each of the 2 D vertices of the generalized sub-cube. Let's now discuss how 
to choose a random signal with the correct distribution of the minimum within each generalized cube. 

Call "So" the GRP signal when a is at the vertex of the generalized sub-cube with lowest r p for all 1 < p < D. 
Call "Sfe" the GRP signal when the GRP position is shifted to another generalized sub-cube vertex for which only 
r-fc changes, leaving all other r p alone. I.e, the shift is parallel to axis k of the unit generalized cube. As the GRP is 
continuously shifted, the signal continuously changes by adding infinitesimal generalized volumes from one generalized 
face of the GRP and removing infinitesimal generalized volumes from the opposite generalized face. Thus for such 
shift the signal changes according to S(tfc) = So + <7jfeiffe(ife). In this equation, tk is ru shifted and rescaled so that 
it runs from to 1 as a moves from one generalized sub-cube vertex to another along the k direction. The Wk(tk) 
is a standard Wiener process, and Ok is the total standard deviation of the signal change caused by a shift from one 
generalized sub-cube vertex to its neighbor. A total shift of bk ~ a,k would move the GRP exactly its total size. It 
would subtract a random signal with unit standard deviation from the original GRP and add an independent random 
signal with unit standard deviation to the shifted GRP. The change in random signal would have variance equal to 

2. A shift by a fraction of bk — ak would change the signal by a random value whose variance is the same fraction of 
2. Therefore a shift by 1/N from one generalized sub-cube vertex to its neighbor along the k direction changes the 
signal by a random value with variance 



In the approximation that the generalized sub-cube is very small, a shift to an arbitrary position within the generalized 
sub-cube gives a signal 



The Wk functions are independent standard Wiener processes. Each one's minimum can be chosen randomly according 
to the distribution of Eq. IA2I with v — Vk — (Sk — So)/<Tfc. So choose a set of independent random Pk, each uniform 



over (0, 1); then choose Zk = (Ufe — y/v\ — 1 ln(Pk) ) /2 and take the minimum signal within the generalized sub-cube 



iiL < ipLi ipH < Hh- 




D 



(JkWk(tk)- 




to be S + Yjk=i a >°Zk- 
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There are two remaining effects of the above described Monte Carlo procedure which systematically shift Coqd 
from its intended value for D > 1. One such effect comes from binning in /, rather than using exact values of / as 
is possible with D = 1. As described, the Monte Carlo program finds the smallest y in a range of /, which will be 
smaller than the smallest y for a fixed value of /. This downward shift in y gets smaller for smaller bins in /. Another 
source of a systematic shift in Coqd is in the opposite direction. The Monte Carlo program can only consider GRPs 
whose sides are all integer multiples of 1/iV. The true minimum of y is almost surely less than the minimum for such 
a restricted set of GRPs; so this limitation of the method shifts y up from its intended value by an amount which 
gets smaller for larger N. Neither of these two shifts would occur if the GRPs were restricted to be generalized cubes, 
but such a restriction would weaken the method's ability to avoid backgrounds which tend to be concentrated near 
the end of the range of one of the D variables while being relatively uniform over the range of other variables. One 
way of correcting for these two sources of shift would be to generate tables using various bin sizes in / and various 
values of N; then extrapolate to zero bin size in / and infinite N. Perhaps better methods can be found to improve 
the calculation of C^d • But any such improvement is probably unnecessary. Use of an approximate result need not 
have a significant effect on computed upper limits. If the same approximate function used in place of Cood with the 
data is also used in the Monte Carlo computation of Cmo,x,d(C, /j,), the result purporting to be a C confidence level 
will still be a true C confidence level, perhaps made conservative by the presence of an unknown background. 
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